Picture for Yong Qin

Yong Qin

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Add code
Jan 30, 2026
Viaarxiv icon

Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue

Add code
Jan 26, 2026
Viaarxiv icon

A data-driven approach to inferring travel trajectory during peak hours in urban rail transit systems

Add code
Dec 10, 2025
Viaarxiv icon

Towards Responsible Evaluation for Text-to-Speech

Add code
Oct 08, 2025
Figure 1 for Towards Responsible Evaluation for Text-to-Speech
Figure 2 for Towards Responsible Evaluation for Text-to-Speech
Figure 3 for Towards Responsible Evaluation for Text-to-Speech
Figure 4 for Towards Responsible Evaluation for Text-to-Speech
Viaarxiv icon

Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning

Add code
Sep 18, 2025
Viaarxiv icon

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Add code
Jun 14, 2025
Viaarxiv icon

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval

Add code
May 26, 2025
Viaarxiv icon

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Add code
Apr 21, 2025
Viaarxiv icon

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

Add code
Mar 20, 2025
Viaarxiv icon

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition

Add code
Feb 26, 2025
Figure 1 for CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Figure 2 for CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Figure 3 for CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Figure 4 for CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Viaarxiv icon